Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques
نویسنده
چکیده
Spam is one of the main problems in emails communications. As the volume of non-english language spam increases, little work is done in this area. For example, in Arab world users receive spam written mostly in arabic, english or mixed Arabic and english. To filter this kind of messages, this research applied several machine learning techniques. Many researchers have used machine learning techniques to filter spam email messages. This study compared six supervised machine learning classifiers which are maximum entropy, decision trees, artificial neural nets, naïve bayes, support system machines and k-nearest neighbor. The experiments suggested that words in Arabic messages should be stemmed before applying classifier. In addition, in most cases, experiments showed that classifiers using feature selection techniques can achieve comparable or better performance than filters do not used them.
منابع مشابه
A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملMachine Learning for E-mail Spam Filtering: Review, Techniques and Trends
We present a comprehensive review of the most effective content-based e-mail spam filtering techniques. We focus primarily on Machine Learning-based spam filters and their variants, and report on a broad review ranging from surveying the relevant ideas, efforts, effectiveness, and the current progress. The initial exposition of the background examines the basics of e-mail spam filtering, the ev...
متن کاملA Survey on Machine Learning Methods in Spam Filtering
Email spam or junk e-mail (unwanted e-mail “usually of a commercial nature sent out in bulk”) is one of the major issue of the today's Internet, that cause financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is an important and popular one. Common uses for mail filters include organizing incoming email and removal of spam and compu...
متن کاملTwo Approaches on Implementation of CBR and CRM Technologies to the Spam Filtering Problem
Recently the number of undesirable messages coming to e-mail has strongly increased. As spam has changeable character the anti-spam systems should be trainable and dynamical. The machine learning technology is successfully applied in a filtration of e-mail from undesirable messages for a long time. In this paper it is offered to apply Case Based Reasoning technology to a spam filtering problem....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 6 شماره
صفحات -
تاریخ انتشار 2009